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Abstract 

We improve the lower bound on the extremal version of the Maximum Agree- 
ment Subtree problem. Namely we prove that two binary trees on the same n 
leaves have subtrees with the same > c log log n leaves which are homeomorphic, 
such that homeomorphism is identity on the leaves. 
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1 Introduction 



A phylogenetic X-tree is a binary tree in which the leaves are labehed bijectively with 
labels from a set X (usually {1,2, ...,n}) and internal vertices are unlabelled. Two 
phylogenetic X-trees are considered the same, if there is a label-preserving graph 
isomorphism between them. 

If T is phylogenetic X-tree and y C X is a set of labels, then the induced binary 
subtree T[y is defined as follows: (a) take the subtree induced by Y in T, and 
(b) substitute paths in which all internal vertices have degree 2 by edges. T\y is a 
phylogenetic Y-tree (see Fig. 1). 

If \Y\ = 4, the induced binary subtree is often identified with an unordered 
partition of Y into two two-element sets, obtained by removing the (unique) internal 
edge of T|y. This partition is known as quartet split. It has been known that the (2) 
quartet splits of phylogenetic X-tree with |X| = n determine the phylogenetic tree 
through a polynomial time algorithm. This was first observed in 1981 by Colonius 
and Schultze [2] , in the context of stemmatology, and was developed further in 1986 
by Bandelt and Dress [1]. 

An important algorithmic problem, known as the Maximum Agreement Subtree 
Problem, is the following: given two phylogenetic X-trees, find a common induced 
binary subtree of the largest possible size. 




T F G 



Figure 1: For X = {1, 2, 3, 4, 5, 6} and the two phylogenetic X-trees shown (T and 
F), a maximum agreement subtree is the phylogenetic tree G = T\y = F\y shown, 
where Y = {2,3,4, 6}. 

This problem has a history that spans more than 25 years, from papers in the 
early 1980s by Gordon [5], and Findcn and Gordon [3]; to its implementation in the 
late 1990s in the widely- used phylogenetic software PAUP [11]. Somewhat surpris- 
ingly, this problem can be solved in polynomial time [10] (see also [4] and [7]). 
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Here we focus on the extremal version of the problem. Let mast(rz) denote the 
smallest order (number of leaves, or vertices) of the maximum agreement subtree of 
two phylogenetic X-trccs with |X| = n. In 1992, Kubicka, Kubicki, and McMorris [6] 
showed that ci(loglogn)^/^ < mast(n) < C2logn with some explicit constants. 

The purpose of our note is to remove the squareroot sign from the lower bound. 
This is achieved by changing the order of two combinatorial steps, one resulting in 
taking logarithm twice, the other taking a squareroot. Of course, the squareroot 
sign after the log log is no longer visible. 

First we would like to exhibit a direct connection to Ramsey theory, which might 
explain the large gap between the lower and upper bounds for mast(n). Let R2{n,£) 
denote the smallest integer m such that for any coloration of the A;-element subsets of 
any m-element set with colors Red and Blue, there exists an n-element subset of the 
m-element set, such that every fc-element subset of the m-element set is colored Red, 
or there exists an ^-element subset of the m-element set, such that every fe-element 
subset of the m-element set is colored Blue (see Chapter 14 in [8]). 

Claim. mast[i?2('T', 6)] > n. 

Proof. We first recall an observation from [1] that for \X\ = 6, any two phylogenetic 
X-trees share a quartet split. Given T and F arbitrary phylogenetic X — trees with 
|X| = i?|(n, 6), color 4-subsets of X Red, if they define the same quartet split, 
otherwise Blue. No six elements of X can have all 4-subsets Blue by the previous 
reference, so there are n elements from X such that all their 4-subsets are colored 
Red. As the binary tree is determined by its quartet splits, these n elements span a 
size n agreement subtree, thereby establishing the Claim. □ 

This approach would give an explicit lower bound for mast(n) in the form of a 
multiply- iterated logarithm, much weaker than ci(loglogn)^/^. 

Before proving our result, we quickly show ci(loglogn)-'^/^ < mast(n) following 
the approach in the 1992 paper by Kubicka, Kubicki, and McMorris [6]. Recall 
that a caterpillar is a tree, which has a path such that every leaf has a neighbor 
on the path (for example, the tree F in Fig. 1). Let us be given two phylogenetic 
X-trees T and F with \X\ = n. As our trees are binary, the diameter of T is at least 
cslogn. Therefore T must have an induced binary caterpillar subtree with leaf set 
Y, such that |y| > cslogn. Consider the induced binary subtree F\y, which must 
have diameter > C4 log log n. Like we argued before, there should he a Z C Y such 
that F\z is a caterpillar and \Z\ > C4loglogn. Notice that T\z = {T\y)\z is also a 
caterpillar. Recall the Erdos-Szekeres Theorem (Ex. 14.15 in [8]) for sequences: two 
sequences composed from the same k'^ + 1 items have either a common A; + 1 length 
subsequence, or they have a common k + 1 length subsequence after reversing the 
order in one sequence. As caterpillar trees can be understood as sequences of their 
leaves, two caterpillar trees with the same k'^ + 1 leaves contain size k + 1 agreement 
subtrees. Apply this with the largest k such that k"^ + 1 < C4 log log n. 
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Before turning to our main result, we need some definitions. We say that a 
phylogenetic X-tree T is drawn on the plane if it is drawn as a plane graph. The 
circumference of phylogenetic tree drawn on the plane is the cyclic permutation of 
X, the leaf set, as we walk around T clockwise. This concept has been been a useful 
combinatorial tool elsewhere (see, for example, [9]) and we illustrate it here in Fig. 
1 by noting that the circumference of this drawing of T is the cyclic permutation 
(1,4,3,6,5,2). 

Note that for Y C. X the induced binary subtree of T (by Y) has a natural 
drawing following steps (a) and (b) by deleting edges and vertices from the plane 
drawing, and then removing the vertex designation of vertices of degree 2, but keeping 
the curve representing the path for representing the new edge. For this natural 
drawing of r|y, the circumference is the circumference of the drawing of T restricted 
to Y. For the tree T in Fig. 1, and the subset Y = {2,3,4,6} the circumference 
of the induced drawing of T|y is cyclic permutation (2,4,3,6) (the same as the 
circumference of the given drawing of G) while the circumference of the induced 
drawing of F\y is the cyclic permutation (2,3,4,6). 

Theorem 1.1 For a constant c> 0, we have: 

clog log n < mast(n). 

Proof. Take two arbitrary phylogenetic X-trees, T and F, with \X\ = n and draw 
them in the plane. Cut the resulting circumferences anywhere to obtain two (linear) 
permutations of X. By the Erdos-Szekeres Theorem, there is subset C/ C X, such 
that the two permutations either put U into the same linear order, or into opposite 
linear order, and \U\ > c^n}!'^ . Like in the proof explained before the theorem, T\\j 
has diameter > C3log|C/| > cglogn. Therefore T\\j has an induced binary subtree 
which is caterpillar, with leaf set such that \y\ > cglogn. Consider now the 
induced binary subtree F\v- The diameter of -Fly is at least C3 log > cyloglogn, 
and therefore there should be a Z C such that F\z = {F\v)\z is a caterpillar 
and \Z\ > cyloglogn. Both T\z and F\z are caterpillars. By the choice of U, these 
two caterpillars have the same or mirror image circumferences. In the second case, 
starting this proof with the mirror image of the drawing of F, we can make sure 
that the caterpillars T\z and F\z have identical circumferences. Taking the longest 
path from T\z (resp. F\z), this path partition the \Z\ — 2 non-endpoint leaves of 
T\z (resp. F\z) into two classes, corresponding to the two sides. We have two 2- 
partitions of \Z\ — 4 or more elements into two classes - it is easy to see that some 
partition classes must have at least {\Z\ — 4)/4 elements in common, say W. Now 
T\\Y = {T\z)\w and F\]y = {F\z)\w are the common induced binary subtree of T 
and F, and \W\ > cgloglogn. □ 

Remark. It would be interesting to see whether Theorem II . II can be tightened. In 



4 



particular, it is conceivable that the much stronger bound d log(n) < mast(n) holds, 
which would be best possible, up to the constant factor. 
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